Skip to content

feat(B-0855): self-registration fires LAST + idempotent across reboots + de-duped against in-flight PRs — Otto-pushes-across-finish-line (Aaron 2026-05-27 architectural fix to B-0812)#5412

Merged
AceHack merged 1 commit into
mainfrom
backlog/b-0855-self-registration-fires-last-idempotent-deduped-2026-05-27
May 27, 2026
Merged

feat(B-0855): self-registration fires LAST + idempotent across reboots + de-duped against in-flight PRs — Otto-pushes-across-finish-line (Aaron 2026-05-27 architectural fix to B-0812)#5412
AceHack merged 1 commit into
mainfrom
backlog/b-0855-self-registration-fires-last-idempotent-deduped-2026-05-27

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 27, 2026

Summary

Architectural fix to B-0812 (iter-5.4.1 self-registration) per Aaron 2026-05-27 empirical anchor: PR #5408 auto-opened mid-install for node-0fe6eb; install then failed at nixos-install --fallback bug (PR #5410 fix-fwd); registration PR orphaned for a node-id that never came up.

Operator: "how did it register before it even rebooted? it should not register until the last step when everything comes up and if it reboots it should not register over and over... cluster should realize it's register or has a pr in flight for register and not duplicate."

4 architectural changes

  1. Move self-registration LAST — OUT of zeta-install.sh Step 6.9; INTO systemd oneshot service that fires on FIRST BOOT of installed OS, AFTER network-online + creds-restore + cluster reachable
  2. Idempotency — marker file (~/.config/zeta/self-registered.marker) + upstream check (maintainers/<op>/cluster-nodes/<node>/node.yaml) + in-flight PR check before composing new PR
  3. Coordination via Path B (Otto-pushes-PR-across-finish-line) — per Aaron's simpler-form preference ("we can just worry about one you otto pushing the pr across the finish line on bootup"); Path A (/tmp folder standard) deferred to future row
  4. De-dup — idempotent branch naming + in-flight detection + comment-on-existing-PR instead of opening duplicates

Composes with

  • B-0812 (parent ancestry) — REFINES (doesn't replace); fixes WHEN + HOW
  • B-0813 — ArgoCD reconciliation; unchanged
  • B-0835 — installer-config-bugs canonical bag (Bug 10)
  • B-0850 — multi-vendor systemd substrate (Otto's tick runs as systemd)
  • B-0851 — persona-first scheduler (Otto's PR-push is scheduled)
  • B-0852 — cred persistence (restored creds are pre-condition for self-register service)

7 sub-rows enumerated

B-0855.1 NixOS module → B-0855.2 TS module + marker → B-0855.4 marker schema → B-0855.5 remove Step 6.9 → B-0855.3 Otto integration → B-0855.6 empirical test → B-0855.7 substrate landing.

PR #5408 closed

Substrate-honestly with cross-link to this row.

Test plan

  • Fresh USB + clean install + first boot: register fires ONCE; marker; PR opens
  • Reboot installed OS: register does NOT re-fire (marker present); no new PR
  • In-flight PR detection: existing open PR → Otto comments + monitors; no duplicate
  • Install failure path: nixos-install fails → NO registration PR opens (registration is post-install-success)

🤖 Generated with Claude Code

…aron 2026-05-27 architectural fix to B-0812)

Empirical anchor: PR #5408 auto-opened mid-install for node-0fe6eb; install
then failed downstream at nixos-install --fallback bug (PR #5410 fix-fwd);
registration PR orphaned for a node-id that never came up.

Operator framing: "how did it register before it even rebooted? it should
not register until the last step when everything comes up and if it
reboots it should not register over and over... cluster should realize
it's register or has a pr in flight for register and not duplicate."

4 architectural changes:
1. Move self-registration OUT of zeta-install.sh Step 6.9 INTO systemd
   oneshot service that fires on first boot of installed OS, AFTER
   network-online + creds-restore + cluster reachable
2. Idempotency: marker file + upstream check + in-flight PR check before
   composing new PR
3. Cluster-agent coordination via Path B (Otto-pushes-PR-across-finish-line;
   per Aaron's simpler-form preference); Path A (/tmp folder standard)
   deferred to future row
4. De-dup: idempotent branch naming + in-flight detection + comment-on-existing

7 sub-rows B-0855.1-7 enumerated. Refines B-0812 (does not replace);
keeps B-0813 ArgoCD reconciliation unchanged.

Composes with B-0812 + B-0813 + B-0835 (Bug 10) + B-0850 (systemd
substrate) + B-0851 (persona-first scheduler) + B-0852 (cred persistence
as pre-condition).

PR #5408 closed substrate-honestly with cross-link to this row.
Copilot AI review requested due to automatic review settings May 27, 2026 06:56
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack enabled auto-merge (squash) May 27, 2026 06:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new P1 backlog row (B-0855) documenting an architectural fix for node self-registration so it fires on first boot (post-install), becomes idempotent across reboots, and avoids duplicate/in-flight registration PRs.

Changes:

  • Added new per-row backlog doc for B-0855 describing the “registration fires last + idempotent + de-duped” architecture.
  • Updated the backlog index to include B-0855.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
docs/backlog/P1/B-0855-self-registration-fires-LAST-post-install-post-first-boot-idempotent-across-reboots-deduped-against-in-flight-registration-prs-aaron-2026-05-27.md New backlog row capturing the B-0855 architectural fix scope, rationale, and acceptance criteria.
docs/BACKLOG.md Adds the B-0855 entry to the generated backlog index.

Comment thread docs/BACKLOG.md
@AceHack AceHack merged commit 3610344 into main May 27, 2026
30 checks passed
@AceHack AceHack deleted the backlog/b-0855-self-registration-fires-last-idempotent-deduped-2026-05-27 branch May 27, 2026 07:01
AceHack added a commit that referenced this pull request May 27, 2026
…0 substrate for Ace migration trajectory (14 sub-steps; 12 declarative-input categories; substrate-anchor for B-0852/0853/0855/0856 cross-refs) (#5420)

* docs(B-0854.1): zeta-install.sh step-state-machine inventory — Phase 0 substrate for Ace migration trajectory

B-0854 sub-row .1 (Phase 0; smallest pure-analysis slice). Documents
the EXISTING imperative bash state-machine in zeta-install.sh so the
B-0854 Phase 2 declarative-Ace-manifest schema can express the same
surface.

Inventory covers:
- Top-level entry (REPO_URL, HOST, ZETA_AUTO_CONFIRM env semantics)
- Step-by-step state machine for all 14 sub-steps (1, 2, 3, 4, 5, 6,
  6.5, 6.55, 6.6, 6.7, 6.8, 6.9, 6.95, 7) with inputs/outputs/side-
  effects/failure-modes/declarative-equivalent per step
- Cross-cutting: operator-prompt accumulation count (7 prompts today;
  B-0852 phase-split target = 1 passphrase prompt)
- Idempotency surface table — informs B-0855 architectural fix scope
- 12 distinct declarative-input categories the Ace manifest must
  capture (Phase 2 sub-row scope)
- Files-generated-during-install table mapping to B-0852.5 cred-
  manifest entries (6 mapped, 3 candidate-expansion items named)

Snapshot date: 2026-05-27 (origin/main 70596a8; PR #5417 cosign
merge). Future refreshes should re-snapshot when zeta-install.sh
changes substantially.

Composes with already-landed substrate-engineering arc:
- B-0852 + sub-rows (cred persistence) — PR #5403/#5411/#5414
- B-0853.1 (cosign signing) — PR #5417 + fix-fwd #5419
- B-0855 (self-register architectural fix) — PR #5412
- B-0856 Path A (deferred /tmp coordination) — PR #5413
- B-0854 parent (Ace migration trajectory) — PR #5405

No code change; pure documentation. Doesn't affect ISO substrate;
batches into substrate-engineering history independent of next ISO
build cycle.

* fix(B-0854.1): escape | inside code spans for MD056 table-column-count compliance

* fix(B-0854.1): 10 Copilot accuracy corrections — verified against actual zeta-install.sh content

PR #5420 Copilot review caught 10 substantive accuracy issues in the
B-0854.1 inventory doc. All 10 verified against origin/main 70596a8's
actual zeta-install.sh content + corrected.

Corrections:
- Name attribution → role-ref ("the human maintainer")
- Step 1 inputs: actual `lsblk -d -p -n -o NAME,TYPE,RM,RO,TRAN` + awk
  filter (not made-up NAME,SIZE,MODEL,TRAN,ROTA)
- Step 3 side effects: `sgdisk --zap-all` only (not `wipefs -af` too)
- Step 4: actual `sgdisk` (NOT `parted`); GPT layout via -n + -t flags;
  whole-disk longhorn partitions on DATA_DISKS too
- Step 6: `nixos-generate-config --root /mnt --force` (NOT
  --no-filesystems; --force overwrites existing config)
- Step 6.5: no MAGIC_NUMBER (didn't exist in script); INJECT_OK gate
  flag; iter-4 v1 manual-config-edit fallback path
- Step 6.9: SELF_REG_OK flag; documented graceful-skip path lines 731+
- nixos-install: actual line ~1004 (NOT 1096-1340); section renamed
  to "nixos-install (the actual build; ~line 1004)" since the prior
  range was wrong
- Step 7: actual lines 1261-1336 (NOT 1341-1352); banner driven by
  GH_AUTH_OK/GH_KEY_COUNT/INJECT_OK/SELF_REG_OK (NOT MAGIC_NUMBER);
  conditional sections listed in declarative equivalent

Resolves 10 Copilot threads on PR #5420.

Root cause of the inaccuracies: original draft was written from
`grep -E "^# ── Step"` summaries + recollection of script behavior,
not careful per-step body reads. Discipline lesson: when authoring
substrate-anchor docs claiming to inventory existing code, the read
must be careful per-line, not skim-grep summary. Composes with
.claude/rules/verify-existing-substrate-before-authoring.md at the
inventory-substrate scope (verify-content-of-thing-being-inventoried
before authoring claims about its content).

---------

Co-authored-by: Lior <lior@zeta.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants